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ABSTRACT. The most frequently used condition for sampling matrices employed in compressive sampling is the restricted 
isometry (RIP) property of the matrix when restricted to sparse signals. At the same time, imposing this condition makes it 
difficult to find explicit matrices that support recovery of signals from sketches of the optimal (smallest possible) dimension. 
, A number of attempts have been made to relax or replace the RIP property in sparse recovery algorithms. We focus on 

the relaxation under which the near-isometry property holds for most rather than for all submatrices of the sampling matrix, 
C ~ ) ' known as statistical RIP or StRIP condition. We show that sampling matrices of dimensions m X N with maximum coherence 

PsJ ' H = 0((fclog 3 N)- 1 / 4 ) and mean square coherence fi 2 = 0(l/(klogN)) support stable recovery of fc-sparse signals 

using Basis Pursuit. These assumptions are satisfied in many examples. As a result, we are able to construct sampling 
matrices that support recovery with low error for sparsity k higher than ^/m, which exceeds the range of parameters of the 
known classes of RIP matrices. 
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1. Introduction 



One of the important problems in theory of compressed sampling is construction of sampling operators that support 
algorithmic procedures of sparse recovery. A universal sufficient condition for stable reconstruction is given by the 
restricted isometry property (RIP) of sampling matrices lfj"4l . It has been shown that sparse signals compressed to 
low-dimensional images using linear RIP maps can be reconstructed using l\ minimization procedures such as Basis 
pursuit and Lasso ll20l[T8l[T4l[TTl. 
' Let x be an /Y-dimensional real signal that has a sparse representation in a suitably chosen basis. We will assume 

that x has k nonzero coordinates (it is a k-sparse vector) or is approximately sparse in the sense that it has at most 
qq ' k significant coordinates, i.e., entries of large magnitude compared to the other entries. The observation vector y is 

formed as a linear transformation of x, i.e., 
cn ! y = $x + z 7 

where $ is an m X N real matrix, in <C N, and z is a noise vector. We assume that z has bounded energy (i.e., 
H2II2 < e). The objective of the estimator is to find a good approximation of the signal x after observing y. This is 
obviously impossible for general signals x but becomes tractable if we seek a sparse approximation x which satisfies 



(1) \\ x — x\\ p < Ci min \\x — x \\ q + CiE 

x' is fc-sparse 



for some p, q > 1 and constants Ci, C2. Note that if x itself is fc-sparse, then ([TJ implies that the recovery error 



\\x — x\\ is at most proportional to the norm of the noise. Moreover it implies that the recovery is stable in the sense 
that if x is approximately fc-sparse then the recovery error is small. If the estimate satisfies an inequality of the type 
(Q~|l, we say that the recovery procedure satisfies a (p, q) error guarantee. 

Among the most studied estimators is the Basis Pursuit algorithm l23l . This is an i\ -minimization algorithm that 
provides an estimate of the signal through solving a convex programming problem 

(2) x — argmin ||a;||i subject to — 2/ j 1 2 < £■ 

Basis Pursuit is known to provide both {t\,t\) and (£2, (-i) error guarantees under the conditions on $ discussed in 
the next section. 

Another popular estimator for which the recovery guarantees are proved using coherence properties of the sampling 
matrix $ is Lasso 1451 1231 , Assume the vector z is independent of the signal and formed of independent identically 
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distributed Gaussian random variables with zero mean and variance a 2 . Lasso is a regularization of the £q minimization 
problem written as follows: 



Here An is a regularization parameter which controls the complexity (sparsity) of the optimizer. 

Compressed sensing is just one of a large group of applications of solutions of severely ill-defined problems under 
the sparsity assumption. An extensive recent overview of such applications is given in iflOl . It is this multitude of 
concrete applications that makes the study of sparse recovery such an appealing area of signal processing and applied 
statistics. 

1.1. Properties of sampling matrices. One of the main questions related to sparse recovery is derivation of sufficient 
conditions for the convergence and error guarantees of the reconstruction algorithms. Here we discuss some properties 
of sampling matrices that are relevant to our results, focusing on incoherence and near-isometry of random submatrices 
of the sampling matrix. 

Let <1> be an m x N real matrix and let <f>\ , . . . , (j>N be its columns. Without loss of generality throughout this paper 
we assume that the columns are unit-length vectors. Let [N] = {1,2,..., N} and let / = . . . , ik} C [N] be a 
fc-subset of the set of coordinates. By J'fc(iV) we denote the set of all fc-subsets of [N]. Below we write to refer 
to the m x k submatrix of <E> formed of the columns with indices in /. Given a vector x g Mr, we denote by xj a 
/c-dimensional vector given by the projection of the vector x on the coordinates in /. 

It is known that at least m = Cl(k \og(N/k)) samples are required for any recovery algorithm with an error guaran- 
tee of the form (HJ (see for example Il36ll37l ). Matrices with random Gaussian or Bernoulli entries with high probability 
provide the best known error guarantees from the sketch dimension that matches this lower bound [20, 5T1I191 . The 
estimates become more conservative once we try to construct sampling matrices explicitly. 

We say that <I> satisfies the coherence property if the inner product \(4>\, <f>j)\ is uniformly small, and call p = 
max^j \ (cf>i, <fij) \ the coherence parameter of the matrix. The importance of inc oherent dictionaries has been recog- 
nized in a large number of papers on compressed sensing, among them |46 49, 30l [171 [151 [161 fTT1| . The coherence 
condition plays an essential role in proofs of recovery guarantees in these and many other studies. We also define the 
mean square coherence and the maximum average square coherence of the dictionary: 



Of course, p 2 < p^ax w i m equality if and only if for every j the sum in /i„ lax takes the same value. Our reliance on 
two coherence parameters of the sampling matrix $ resembles somewhat the approach in J3]|4]; however, unlike those 
papers, our results imply recovery guarantees for Basis Pursuit. Our proof methods are also materially different from 
these works. More details are provided below in this section where we comment on previous results. 

1.1.1. The RIP property. The matrix $ satisfies the RIP property (is (k, <5)-RIP) if 



holds for all fc-sparse vectors x, where S € (0, 1) is a parameter. Equivalently, $ is (k, S)-RTP if ||$j <&i — Id|| < S 
holds for all / £ [N], \I\ = k, where || • || is the spectral norm and Id is the identity matrix. The RIP property 
provides a sufficient condition for the solution of (f2) to satisfy the error guarantees of Basis Pursuit lEOl [T8l [141 [TTI . 
In particular, by 0141 . (2k, V2 — 1)-RIP suffices for both (£i,£i) and (£2, £\) error estimates, while [TTT1 improves this 
to (1.75fc,\/2- 1)-RIP. 

As is well known (see l46l (26)), coherence and RIP are related: a matrix with coherence parameter p is (k, (k — 
l)//)-RIP This connection has served the starting point in a number of studies on constructing RIP matrices from 
incoherent dictionaries. To implement this idea one starts with a set of unit vectors <f>\, . . . , <pN with maximum co- 
herence p. In other words, we seek a well-separated collection of lines through the origin in R m , or reformulating 
again, a good packing of the real projective space RP m_1 . One way of constructing such packings begins with taking 
a set C of binary m-dimensional vectors whose pairwise Hamming distances are concentrated around m/2. Call the 
maximum deviation from m/2 the width w of the set C. An incoherent dictionary is obtained by mapping the bits of a 
small-width code to bipolar signals and normalizing. The resulting coherence and width are related by w(C) = pm/2. 



(3) 



x = arg min -\\$x - y\\ 2 2 + X N a 2 \\x\\ ei . 



26R N 




(4) 



(l-5)||a 5 ||i<||* aJ ||l<(l + <y)N|i 
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One of the first papers to put forward the idea of constructing RIP matrices from binary vectors was the work 
by DeVore ll25l . While l25l did not make a connection to error- correcting codes, a number of later papers pursued 
both its algorithmic and constructive aspects ll6l [T2l FPU l24l . Examples of codes with small width are given in J2], 
where they are studied under the name of small-bias probability spaces. RIP matrices obtained from the constructions 
in El satisfy m = O( lo<y fc ( |°gf jv) ) 2 . Ben-Aroya and Ta-Shma Q recently improved this to m = O( fc 1 ' o ° < f / f ) 5/4 for 

(log TV) -3 / 2 < p < (logiV) -1 / 2 . The advantage of obtaining RIP matrices from binary or spherical codes is low 
construction complexity: in many instances it is possible to define the matrix using only 0(log N) columns while 
the remaining columns can be computed as their linear combinations. We also note a result by Bourgain et al. (8) 
who gave the first (and the only known) construction of RIP matrices with k on the order of m 3 +c (i.e., greater than 
0(y/m)). An overview of the state of the art in the construction of RIP matrices is given in a recent paper 0. 

At the same time, in practical problems we still need to write out the entire matrix; so constructions of complexity 
O(N) are an acceptable choice. Under these assumptions, the best tradeoff between m, fc and N for RIP-matrices 
based on codes and coherence is obtained from Gilbert- Varshamov type code constructions: namely, it is possible 
to construct (fc, 6)-RIP matrices with m = 4(fc/<5) 2 log N. At the same time, already |2j observes that the sketch 
dimension in RIP matrices constructed from binary codes is at least m = 0((fc 2 log N) / log fc). 

1.1.2. Statistical incoherence properties. The limitations on incoherent dictionaries discussed in the previous section 
suggest relaxing the RIP condition. An intuitively appealing idea is to require that condition (|4) hold for almost all 
rather than all fc-subsets I, replacing RIP with a version of it, in which the near-isometry property holds with high 
probability with respect to the choice of I € CPfc(Af). Statistical RIP (StRIP) matrices are arguably easier to construct, 
so they have a potential of supporting provable recovery guarantees from shorter sketches compared to the known 
constructive schemes relying on RIP. 

A few words on notation. Let [N] := {1,2,..., N} and let 7k{N) denote the set of fc-subsets of [N]. The usual 
notation for probability Pr is used to refer a probability measure when there is no ambiguity. At the same time, we use 
separate notation for some frequently encountered probability spaces. In particular, we use Pjj k to denote the uniform 
probability distribution on ^(TV). If we need to choose a random fc-subset / and a random index in [iV]\i", we use 
the notation Pr> . We use P R k to denote any probability measure on R fe which assigns equal probability to each of the 
2 k orthants (i.e., with uniformly distributed signs). 

The following definition is essentially due to Tropp ll49l |4§1 . where it is called conditioning of random subdic- 
tionaries. 

Definition 1. An m x N matrix $ satisfies the statistical RIP property (is (k, 5, e)-StRIP) if 



holds for at least a 1 — e proportion of all k-subsets of [N] and for all x € K fc . 

A related but different definition was given later in several papers such as lfT2l [3] [30) as well as some others. In 
these works, a matrix is called (fc, 5, e)-StRIP if inequality © holds for at least 1 — e proportion of fc-sparse unit 
vectors z <E M. N . While several well-known classes of matrices were shown to have this property, it is not sufficient 
for sparse recovery procedures. Several additional properties as well as specialized recovery procedures that make 
signal reconstruction possible were investigated in lTT2l . 

In this paper we focus on the statistical isometry property as given by Def. Q]and mean this definition whenever we 
mention StRIP matrices. We note that condition ([5]) is scalable, so the restriction to unit vectors is not essential. 

Definition 2. An m x N matrix $ satisfies a statistical incoherence condition (is (fc, a, e)-SINC) if 



This condition is discussed in fl29l |47l , and more explicitly in ll48l . Following l48l , it appears in the proofs of 
sparse recovery in [TT31 and below in this paper. A somewhat similar average coherence condition was also introduced 
in |[3] SI . The reason that © is less restrictive than the coherence property is as follows. Collections of unit vectors 
with small coherence (large separation) cannot be too large so as not to contradict universal bounds on packings of 
EP" 1 " 1 . At the same time, for the norm || $j </>,; || 2 to be large it is necessary that a given column is close to the majority 
of the fc vectors from the set /, which is easier to rule out. 



P Rk ({Ie9 k (N) : ||$f$j-Id||<<5}) >l-e. 




(l-6)\\x\\l<\\$ lX \\ 2 <(l + S)\\x\\t 



(6) 



PR k ({I € Vk(N) : max^ 7 < a}) > 1 - e. 
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Nevertheless, the above relaxed conditions are still restrictive enough to rule out many deterministic matrices: the 
problem is that for almost all supports I we require that be small for all i g" I. We observe that this condition 

can be further relaxed. Namely, let 

S($) = {t G K : 3/ G 7 k (N), i G I c such that ||$/</>i|| 2 = *} 
be the set of all values taken by the coherence parameter. Let us introduce the following definition. 

Definition 3. An mx N matrix $ is said to satisfy a weak statistical incoherence condition (to be a (k, S, a, e)-WSINC) 
if 

(7) ]T Pr, ({(/,;),/ e r such that \\^&h = t})g(s,t) < , 

teB(*) 

where g(S, t) is a positive increasing function oft and 

A a ($) = {7 G 9 k (N) :3ie I c such that ||$f > a}. 

We note that this definition is informative if g(S,t) < 1; otherwise, replacing it with 1 we get back the SINC 
condition. Below we use g(S, t) = cxp( — (1 — S) 2 / (8t 2 )). This definition takes account of the distribution of values 
of the quantity ||$j0j|| for different choices of the support and a column 4>i outside it. For binary dictionaries, the 
WSINC property relies on a distribution of sums of Hamming distances between a column and a collection of k 
columns, taken with weights that decrease as the sum increases. 

Definition 4. We say that a signal x <E R N is drawn from a generic random signal model S k if 

1) The locations of the k coordinates ofx with largest magnitudes are chosen among all k-subsets I C [N] with a 
uniform distribution; 

2) Conditional on I, the signs of the coordinates Xi,i G I are i.i.d. uniform Bernoulli random variables taking 
values in the set {1, —1}. 

Using previous defined notation, the probability induced by the generic model Pg k can be decomposed as Pr. ■ 

1.2. Contributions of this paper. Our results are as follows. First, we show that a combination of the StRIP and 
SINC conditions suffices for stable recovery of sparse signals. In their large part, these results are due to 09] ■ We 
incorporate some additional elements such as stability analysis of Basis Pursuit based on these assumptions and give 
the explicit values of the constants involved in the assumptions. We also show that the WSINC condition together with 
StRIP is sufficient for bounding the off-support error of Basis Pursuit. 

One of the main results of fl49ll48l is a sufficient condition for a matrix to act nearly isometrically on most sparse 
vectors. Namely, an m X TV matrix $ is (k, S,e = fc~ s )-StRIP if 

Wfclog(^ + l) + ^il*H 2 < cS, 

where s > 1 and c is a constant; see |4"9l . Theorem B. For this condition to be applicable, one needs that fj, = 
0(1/ ^/k log(l / e)). For sampling matrices that satisfy this condition, we obtain a near-optimal relation m = 0(k \og(N/e)) 
between the parameters. Some examples of this kind are given below in Sect. [5] As one of our main results, we extend 
the region of parameters that suffice for (k, 6, e)-StRIP. Namely, in Theorem 14.71 we prove that it is enough to have 

the relation /i = 0(1/ -\l /clogfclog 3 (l/e)). This improvement comes at the expense of an additional requirement on 
fi 2 = 0(l/(fclog(l/e))) (or a similar inequality for /i, 2 nax ), but this is easily satisfied in a large class of examples, 
discussed below in the paper. These examples in conjunction with Theorem |4T| and the results in Section |2]establish 
provable error guarantees for some new classes of sampling matrices. 

We note a group of papers by Bajwa and Calderbank ||3] 5] [T3] which is centered around the analysis of a threshold 
decoding procedure (OST) defined in |3 j. The sufficient conditions in these works are formulated in terms of fi 
and maximum average coherence v = maxi<j<Ar | 2~2ijtj (<fii > <Aj) I • Reliance on two coherence parameters of $ 
for establishing sufficient conditions for error estimates in y!) is a shared feature of these papers and our research. 
At the same time, the OST procedure relies on additional assumptions such as minimum-to-average ratio of signal 
components bounded away from zero (in experiments, OST is efficient for uniform-looking signals, and is less so for 
sparse signals with occasional small components). Some other similar assumptions are required for the proofs of the 
noisy version of OST |4). 
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We note that there is a number of other studies that establish sufficient conditions for sampling matrices to provide 
bounded-error approximations in sparse recovery procedures, e.g., Ifl6l[34l[35l . At the same time, these conditions are 
formulated in terms different from our assumptions, so no immediate comparison can be made with our results. 

As a side result, we also calculate the parameters for the StRIP and SINC conditions that suffice to derive an error 
estimate for sparse recovery using Lasso. This result is implicit in the work of Candes and Plan lfl5l . which also uses 
the SINC property of sampling matrices. The condition on sparsity for Lasso is in the form k = 0(A7||<I>|| 2 log N), 
so if ||$|| 2 ~ N/m, this yields k < 0(m/ log TV). This range of parameters exceeds the range in which Basis Pursuit 
is shown to have good error guarantees, even with the improvement obtained in our paper. At the same time, both 
|[T5l and our calculations find error estimates in the form of bounds on \\$>x — &x\\ 2 rather than \\x — x\\2, i.e., on the 
compressed version of the recovered signal. 

In the final section of the paper we collect examples of incoherent dictionaries that satisfy our sufficient conditions 
for approximate recovery using Basis Pursuit. Two new examples with nearly optimal parameters that emerge are 
the Delsarte-Goethals dictionaries [39] and deterministic sub-Fourier dictionaries [13 11 . For instance, in the Delsarte- 
Goethals case we obtain the sketch dimension m on the order of k log 3 ^ , which is near-optimal, and is in line with 
the comments made above. 

We also show that the restricted independence property of the dictionary suffices to establish the StRIP condition. 
Using sets of binary vectors known as orthogonal arrays, we find (fc, <5, e)-StRIP dictionaries with k = 0(m 3 / 7 ). 
At the same time, we are not able to show that restricted independence gives rise to the SINC property with good 
parameter estimates, so this result has no consequences for linear programming decoders. 

Acknowledgment: We are grateful to Waheed Bajwa for useful feedback on an early version of this work. 



2. Statistical Incoherence Properties and Basis Pursuit 

In this section we prove approximation error bounds for recovery by Basis Pursuit from linear sketches obtained 
using deterministic matrices with the StRIP and SINC properties. 



2.1. StRIP Matrices with incoherence property. It was proved in [49 1 that random sparse signals sampled using 
matrices with the StRIP property can be recovered with high probability from low-dimensional sketches using linear 
programming. In this section we prove a similar result that in addition incorporates stability analysis. 

Theorem 2.1. Suppose that x is a generic random signal from the model Sk- Let y = $a; and let x be the approxi- 
mation of x by the Basis Pursuit algorithm. Let I be the set of k largest coordinates of x. If 

(1) $ is (k,8,e)-StRIP; 

(2) (k, sioJnm ^> SINC ' 
then with probability at least 1 — 3e 

\\xj - xr|| 2 < 



2^2 log(2AT/ e ) k - s P arse 
and 

\\xic — xjc j ± < 4 min \\x — x'\\i 

x'is k -sparse 

This theorem implies that if the signal x itself is fc-sparse then the basis pursuit algorithm will recover it exactly. 
Otherwise, its output x will be a tight sparse approximation of x. 

Theorem 12. 11 will follow from the next three lemmas. Some of the ideas involved in their proofs are close to the 
techniques used in |2"T1 . Let h = x — x be the error in recovery of basis pursuit. In the following / c [N] refers to 
the support of the k largest coordinates of x. 

Lemma 2.2. Let s = 8\og(2N/e). Suppose that \\($J$i)~ 1 \\ < j^g and 

\\$J(l>i\\l < -S) 2 for all ie I c := [TV] \ I. 

Then 

||M 2 < s - 1/2 HMIi- 
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Proof. Clearly, $/i = Qx - Qx = 0, so <5>/b/ = -<£j hjc and 
We obtain 

11M2 < \\(*j*i)- i \\w$?$ich I o\\ 3 < ^ e ii^i^n 

<s- 1/2 HMIi, 

as required. I 

Next we show that the error outside / cannot be large. Below sgn(it) is a ±l-vector of signs of the argument vector 

u. 

Lemma 2.3. Suppose that there exists a vector v £ R ff such that 

(i) v is contained in the row space 0/$, say v = $> T w; 

(ii) «/ = sgn(x/); 

(iii) \\vi4e x < Vs. 
Then 

(8) ||/»j»||i<4||xj.||i. 
Proof. By (f2]i we have 

||aj||i > \\x\\i = \\x + h\\ 1 = ||asj + h/||i + \\xic + hi4i 
> + (sgn(x/),/i/) + - 

Here we have used the inequality ||a + b|| 1 > ||a|| 1 + (sgn(a), b) valid for any two vectors a,b £ and the triangle 
inequality. From this we obtain 

\\hi4t < |(sgn(xj),/»/>|+2||a! / e|| 1 . 
Further, using the properties of v, we have 

|(sgn(a;/),h/)| = 

= \(v,h) - (vic,hic)\ 

< |($Vfc)l + K«/e,MI 

< |(«;,$/»>| + ||«j.|| /oo ||hj.||i 

< ^ll^lli. 

The statement of the lemma is now evident. I 

Now we prove that such a vector v as defined in the last lemma indeed exists. 

Lemma 2.4. Let x be a generic random signal from the model . Suppose that the support I of the k largest 
coordinates of x is fixed. Under the assumptions of Lemma \2.2\ the vector 

v = $ T $/(<l>f ^z)" 1 sgn(x/) 

satisfies (i)-(iii) of Lemma \2.3\ with probability at least 1 — e. 

Proof. From the definition of v it is clear that it belongs to the row-space of <£> and vj = sgn(xj). We have Vi = 
tpj&i^&i)- 1 sgn(xj) = (sj,sgn(a;/)), where 

We will show that \vi\ < \ for all i e I c with probability 1 — e. 

Since the coordinates of sgn(xj) are i.i.d. uniform random variables taking values in the set {±1}, we can use 
Hoeffding's inequality to claim that 



(9) 



^(H>l/2)<2cxp(- * ). 
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On the other hand, for all i £ I c , 



wi 2 = mj^ir^j^h 



< 

(10) 

Equations (O and ( TTOb together imply for any i £ I c , 

P R *(\vi\ > I) <2exp 



1 1-5 



1 - 8 ^81og(2/Y/e) 
1 



^81og(2iV/e) 

1 \ e 



8(l/v/81og(2^/e)) 2 / 
Using the union bound, we now obtain the following relation: 

(11) P R *(\\v I e\\ 00 >l/2)<e. 

Hence \vi\ < | for all i G 7 C with probability at least 1 — e. I 

Now we are ready to prove Theorem 12.11 



Proof of Theorem \2j\ The matrix $ is (fc, <5, e)-SRIP. Hence, with probability at least 1 - e, || ($f $7) _1 || < jz?- At 
the same time, from the SINC assumption we have, with probability at least 1 — e over the choice of /, 



81og(2iV/e)' 

for all i € I c . Thus, will have these two properties with probability at least 1 — 2e. Then from Lemma [2T2| we 
obtain that 

||h/|| 2 < . \ Which, 
~ v/81og(2iV/e)" 

with probability > 1 — 2e. Furthermore, from Lemmas 12.31 12.41 

\\hi4i<*\\*i4u 
with probability 1 — e. This completes the proof. I 

2.2. StRIP Matrices with weak incoherence property. In this section we establish a recovery guarantee of Basis 
Pursuit under the weak SINC condition defined earlier in the paper. 

Theorem 2.5. Suppose that the sampling matrix^ is (k, 8, e)-StRIPand (k, S, a, e 2 )-WSINC, where a = (1 — S) 2 /8 log(2A r /e) 
and fjs{t) = cxp(— (1 — S) 2 /8t 2 ). Suppose that the signal x is chosen from the generic random signal model and let 
x be the approximation of x found by Basis Pursuit. Then with probability at least 1 — 4e we have 

||xrc — x/o||i < 4 min \\x — x'\\\. 

x'is fe-sparse 

If x is fc-sparse and satisfies the condition y = $a;, then this theorem asserts that Basis Pursuit will find the support 
of x. If in addition x is the only fc-sparse solution to y = &x, then we have x — x. Note that the WSINC property 
is not sufficient for the (£2,^1) error guarantee. However, once the corrected support is detected, the signal x can be 
found by solving the overcomplete system y = ^>jx. 

To prove Theorem l2.5l we refine the ideas used to establish Lemma l2~4l 

Lemma 2.6. Suppose that the sampling matrix <1> satisfies the conditions ofTheorem \2.5\ For any x £ R fc and I C [N] 
define v(x,I) = <f> T <f>/((f>f ^j)" 1 sgn(a;). Let 

p(I) = P Rk (\\v I c(x,I)\\ 00 > 1/2), 

Then 

P Rk ({I:p(I)>e})<3e. 
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Proof. As in the proof of Lemma UA\ we define the vector 

Sj :(/) = ($f$/)- 1 $J ( /) l GM fe 

and let Vi(x, I) be the ith coordinate of the vector v(x,I). From now on we write simply Vi, Si, omitting the depen- 
dence on I and x. Let M = M($) := {/ € J'fe(iV) : ||$f$/|| 2 > 1 - 5}, then the StRIP property of $ implies 
that 

P Rk (M)>l-e. 

By definition, for any I £ M 

hth = KQjQir^M* < j-^ll*T^ll2- 

Now we split the target probability into three parts: 

P Rk ({I : p(I) > e}) = P Rk ({/el/ni: p(I) > e}) + P flfc ({/ G M fl A c : > e}) 
+ P flt (UeA/ c :p(/)>6}), 

where A = A Q ($) = {/ : \\&J 4>%^2 > a f° r some * € -? c } is the set of supports appearing in the definition of the 
WSINC property. If / e M n A, i.e., it supports both StRIP and SINC properties, then dTT) implies that p(I) < e, 
so the first term on the right-hand side equals 0. The third term refers to supports with no SINC property, whose total 
probability is < e. Estimating the second term by the Markov inequality, we have 

(12) P flt ({Je MR A c :p(J) > e» < L '1 

where 1() denotes the indicator random variable. We have 

(13) E fl Jp(/),/eMn#] = E 4 b(/)i(/eMn#)]= £ -Jrp(J), 

Let us first estimate p(I) for / G M n ^4 C by invoking Hoeffding's inequality (O: 

p(J) = P Rk (3i G / c , > 1/2) < £ > 1/2) 



te8(s) 

Substituting this result into (fT3l l, we obtain 



= 2(N-k) «p(-^^)i , Ri (||*M 2 = *|/). 



E fl J P (/),{/eii/nA c }]<2(iV-fc) £ cxp(-ii^) XI 7^^(11*^11=* I /) 

teB(*) /g A/nA<= VfeJ 

<2(JV-fc) ^ cxp(- il^)p fl ,( /e ^j|$f0|| 2 = t) 



< 2e 2 



where the last step is on account of (fT2l and the WSINC assumption. I 

Proof of Theorem \2.5\ Define the set B by 

B = {I e R k : Pflfc(|MU > 1/2 | /) > e}. 
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Recall that Theorem l2.5l is stated with respect to the random signal x. Therefore, let us estimate the probability 

P RkXRk ({(I,x) : Moo > 1/2}) 

= E P R k xR^({x :\\v I 4 00 > 1/2} \I)P RkXRk (I) 

= J2 P R»(i x ■■ Moo > 1/2} | I)P Rk {I) + J2 P R»(i x : IMU > V2} I T)P Rk (I). 

IGB" IeB 

We have P R k ({x : \\vjc > 1/2} \ I) < e from Lemma [2~4l and P Rk (B) < 3e from Lemma [2~6l so 

P RkXRk ({(I,x) : II^IU > 1/2}) < e(l + 3e) < 4c. 

This implies that with probability 1 — 4e the signal a; chosen from the generic random signal model satisfies the 
conditions of Lemma l2~3l i.e., 

||jB/c - X jo ||i < 4||xjo||i. 

This completes the proof. I 

3. Incoherence Properties and Lasso 

In this section we prove that sparse signals can be approximately recovered from low-dimensional observations 
using Lasso if the sampling matrices have statistical incoherence properties. The result is a modification of the methods 
developed in [15 49 1 in that we prove that the conditions used there to bound the error of the Lasso estimate hold with 
high probability if $ is has both StRIP and SINC properties. The precise claim is given in the following statement. 

Theorem 3.1. Let x be a random k-sparse signal whose support satisfies the two properties of the generic random 
signal model Su- Denote by x its estimate from y = $>x + z via Lasso (|3), where z is a i.i.d. Gaussian vector with 
zero mean and variance a 1 and where A = 2\/21og N. Suppose that k < y^p^g N > where Cq is a positive constant, 
and that the matrix $ satisfies the following two properties: 

(1) $ is (k, \,e)-StRIP. 

(2) gfrfo m^Nm ^- SINC - 

Then we have 

||$x - $x\\l < C k\ogNa 2 , 
with probability at least 1 — 3e— N ^ 2 ^ loir N —N " a , whereCo > is an absolute constant and a = 0.151og(2iV/e) — 1. 

The following theorem is implicit in 11151 . see Theorem 1.2 and Sect 3.2 in that paper. 

Theorem 3.2. (Candes and Plan) Suppose that x is a k-sparse signal drawn from the model Sk, where 

c N 
~ ||$i| 2 logiV' 

where cq > is a constant. Let I C [N] be the support of x and suppose the following three conditions are satisfied: 

(1) ||($J$j)- 1 ||<2. 

(2) ||$ T 2|| £oo < 2VbgTV. 

(3) ||$?'e*/(#]'*/)~ 1 *j*ll/o. + V8ToilV||$j:$ / ($f$ 7 )- 1 sgn( a;/ )||, oo < (2 - V2) v / 2loiiV. 
Then 

||$x - $x||| < C A:(logiV)cr 2 , 

where Co is an absolute constant. 

Our aim will be to prove that conditions (l)-(3) of this theorem hold with large probability under the assumptions 
of Theorem |3.1| 

First, it is clear that Hcp^Hoo < 2-^/log N with probability at least 1 — {N^J2tt log A^) _1 . This follows simply 
because z is an independent Gaussian vector, and has been discussed in 1151 (this is also the reason for selecting the 
particular value of An)- The main part of the argument is contained in the following lemma whose proof uses some 
ideas of |fT31 . 
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Lemma 3.3. Suppose that 1 / , 2 < \\$J $j — Id|| < 3 /2 and that for all i e I c , 

\\*T4H\\l < (^S^iV/e))" 1 . 
Then Condition (3) ofTheorem \3.2\ holds with probability at least 1 — e — N~ a for a = 0.15 log(2AT/e) — 1. 
Proof. Let i G 7 C . Define Zo,j = (tUj,sgn(a;/)) and Z\^ = («^, z), where 

Wi = (S^/)" 1 ^, 

Let = maxig/c \Zq^\ and Zi = maxig/c |Zi j|. We will show that with high probability Zq < 1/4 and Z\ < 
(1.5 — v / 2)v / 2ToglV which will imply the lemma. We compute 

IKI| 2 < ||($f$7)- 1 ||||$j0 i || 2 <2 



8V21og(2JV/e) 



and 

KII2 ^II^IIIK^/)- 1 !!!!^^^ 



4 v /21og(2iV/ e ) 

3 



2 8V21og(2iV/f 



V3 



8Vlog(27V/e) 

for all i £ 7 C . Let ai = 1.5 - \/2- Since Z 1;i - 7V(0, ||^||1), we have 



Pr(Zi >aiV21ogiV) < (JV- *)Pr (|Z M | > a iy /2logN) 

< 2(Af" - fc)j|w-|| 2 e _§4 a 2 logA r log(2A r /e) 

" 01^/2^(2 log N) 

< 2 ' 1 jy-0.1Slog(2iy/e)+l 

" v /(21ogiV)log(2iV/e) 

< AT" a . 

(the multiplier in front of the exponent is less than 1 for all N > 4 and e < 1). Further, since the signs sgn(xj), i E I 
are uniform i.i.d. random variables, we have 

Pr(Z > 1/4) < (N — k) Pr(|(tUi,sgn(xj))| > 1/4) 

< 2(^-fc)e- 1 / (32||Will = ) 

< e. 

The proof is complete. I 



Theorem l3.1l is now easily established. Indeed, the assumptions of Lemma kOl are satisfied with probability at least 
1 — 2e. The claim of the theorem follows from the above arguments. 

4. Sufficient conditions for statistical incoherence properties 

As discussed earlier, recovery properties of sampling matrices in linear programming decoding procedures are 
controlled by the coherence parameter ^($) = maxjj fj^j. In particular, the Gershgorin theorem implies that the 
condition /i = 0(k^ 1 ) is sufficient for stable and robust recovery of signals with sparsity k. In this section we show 
that this result can be improved to p = 0(fc -1 / 4 ) in that the matrix satisfies the StRIP and SINC conditions. The 
results of Sect. |2]then imply stable recovery of generic random fc-sparse signals using linear programming decoding. 

Let $ be an m x N sampling matrix with columns <f>i, i = 1, . . . , JV. As above, let /i^ = 1 4>f(j>j | . Call the matrix 
<I> coherence-invariant the set Mj := {pij , j <G [N]\i} is independent of i. Observe that most known constructions of 
sampling matrices satisfy this property. This includes matrices constructed from linear codes ESEE.i chirp matrices 
and various Reed-Muller matrices [3. 12], as well as subsampled Fourier matrices [QTI . Our arguments change slightly 
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if the matrix is not coherence-invariant. To deal simultaneously with both cases, define the parameter 8 = #(<!>) as 
9 = ft 2 if <£> is coherence-invariant and 9 = /i„ lax otherwise. 

The next theorem gives sufficient conditions for the SINC property in terms of coherence parameters of <&. 

Theorem 4.1. Let $ be an m x TV matrix with unit-norm columns, coherence fi and square coherence 9. Suppose that 
<I> is coherence-invariant, 

(14) M 4 < J* ~lf' and 0< 



32fc(log 2TV/e) 3 " k log(2TV/e) ' 

where f3 > and < a < 1 are any constants. Then $ has the (k, a, e)-SINC property with a = j3/ log(2TV/e). 

Before proving this theorem we will introduce some notation. Fix j E [TV] and let Ij = {ii,i2, ■ • ■ , ik} be a random 
fc-subset such that j £ Ij. The subsets Ij are chosen from the set [TV — 1] with uniform distribution. Define random 
variables Yjj = //? . , I = 1, . . . , k. Next define a sequence of random variables Zjj, t = 0, 1, . . . , k, where 

k k 

Z ifi = E, ^ Z,- 1 = E, ( £ >',/ I . . . , V,,) • / 1.2 fc. 

;=i i=i 

From the assumption of coherence invariance, the variables Z^ t for different j are stochastically equivalent. Let 

k 

Z t = E j Z j , t = E K (j2Yi,l\Yj,uY^---,Yi,t), t = l,...,k. 
i=i 

The random variables Z t are defined on the set of (k + l)-subsets of [TV] with probability distribution Pr' . We 
will show that they form a Doob martingale. Begin with defining a sequence of a-algebras Ft,t = 0, 1, . . . , k, where 
J-q = {0, [TV]} and Ft,t > 1 is the smallest er-algebra with respect to which the variables Yj t i, . . . , Yj :t are measurable 
(thus, Ft is formed of all subsets of [TV] of size < t + 1). Clearly, Fq C F\ C • • • C Fk, and for each t, Z t is a 
bounded random variable that is measurable with respect to Ft ■ Observe that 



i=i i=i 



max " 



( 15 > z o = EjZj.a = E fffc E vli, = E E K^ln = k ^ 

(16) < kfll 

where (fT~5T > assumes coherence invariance, and (TTol is valid independently of that assumption. 

Lemma 4.2. The sequence (Z t ,Ft)t=o,i,...,k forms a bounded-differences martingale, namely Er' (Zt \ Zq,Z\,. 
Z t ~i and 



\Z t -Z t . 1 \<2^[l+ N _ k _ 2 ), t = l,...,k. 
Proof. In the proof we write E instead of E ri . We have 

k t k 

z * = E (E y ^ i n) = E ^< + E ( E Y ^ i 

i=i i=i i=t+i 

k k 

= Z t -x + Y ht + E( Yj,i I ft) - e(J2 Y ^ I * 



t-i 

i=t+i ' i=t 
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Next, 



E(Z t | Z , Zi, . . . , Zt-i) = Zt-x + E(Y,,t | Z , Z lt . . . , Z t -i) + e(e( £ Y j,i I Ft) I Zo, ■ ■ ■ , Zt-x) 



i=t+i 



K 

\(E(j2Yj,i\ Ft-x) I Z ,...,Z t 



i=t 



Zt_i + E(^, t I Z 0) ...,Z t 



e( J2 y u I z ,...,z { _x) -E(^y^ | z ,...,z 



■Jt-\ 



l=t+l 



1=1 



Zt-x, 



which is what we claimed. 

Next we prove a bound on the random variable \Z t — Z t -\\. We have 

k k 

\z t - z t _!| = \e(J2 y ^ I ^) - E (E r ^ I 



1=1 



i=i 



^max^^r^ | T t -i,Y u = aj - e(^Y,- , | T t -x,Y t ,i = b 
a ' i=i i=i 

k 

' (e(^-,J I Ft-i,Y t ,t = a) - E(Y j>t | Ft-i,Y t ,i = b 

k 

b+ J2 ( E (^-< I Ft-i,Y tll = a) - E(V^ | J- t _!,K t . ( = 6 



max 

a, 6 



i=l 



max 

a, b 



< 



v + E 



Z=i+1 

2^ 2 



;=t+i 
AT- 2 
N-k-2 



N -1-2 



To prove Theorem l4.1l we use the Azuma-Hoeffding inequality (see, e.g., iHTI '). 

Proposition 4.3. (Azuma-Hoeffding) Let X , . . . ,X)~-\ be a martingale with |Xj — < aj /or eac/z i, /or 

suitable constants a^. Then for any v > 0, 

fe-i 



t=i 



> i/) < 2exp 



2E« 



Proof of Theorem U.H Bounding large deviations for the sum | X) t=1 (Zt — Zt-i)\ = \Zje — Zq|, we obtain 
(17) p t( | Zt _ Zo | > , ) < 2exp (_ _^_ 5 _ ), 

where the probability is computed with respect to the choice of ordered (fc+l)-tuples in [AT] and v > is any constant. 
Assume coherence invariance. Using (TT~5T > and the inequality (AT — 2)/(N — k — 2) < 2 valid for all k < ^ — 1, we 
obtain 



Pr(Z k > v + kfi 2 ) < Pr(|Z fc -kp. 2 \ >v)< 2exp(^- 
Now take j3 > and ;/ = los ^2N/e) ~ Suppose that for some n£ (0,1) 



32fi 2 k 



(18) 



a/3 



log(2A^/e)' 
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then we obtain 



(19) Pr > ^(kle)) * 2 ^ ( ~ 3^fc) ^ 

Now the first claim of Theorem |4. 1 I follows by the union bound with respect to the choice of the index j. 

Assume that $ does not satisfy the invariance condition. Then we rely on (fT~6b and repeat the above argument with 
respect to ^ ax . I 

The above proof contains the following statement. 

Corollary 4.4. Let $ be an m x N matrix with coherence \jl and 9 = p 2 or p^ax, as appropriate. Let a G (0, 1) and 
j3 > be any constants. Suppose that for a < f3 log 2 e, 

4 (1 - a) 2 a a 

ThenP K (j:l 1 pl J >a)<2e-^ a . 

Proof. Denote a = f3/(\og(2N/e)), then e/N = 2e~ f)/a . The claim is obtained by substituting a in ([18])- (H). I 

We note that this corollary follows directly from the SINC property under our assumptions on coherence and mean 
square coherence. We observe that the SINC property naturally implies some StRIP condition as given in the following 
theorem. 



Theorem 4.5. Let <£> be an m x N matrix. Let I C [N] be a random ordered k-subset and suppose that for all j G /, 
Pr (Eml!i P\i m > 52 l k ) < e i/ k - Then * is a ^ ei)-StRIP matrix. 

Proof. Given / let H(I) = $f $ 7 - Id be the "hollow Gram matrix". Let B = {I : \\H(I)\\ 2 > 5} C 9 k (N). We 
need to prove that Pn k (B) < e. Let (ei, . . . , et) be the standard basis of M. k . Define a subset C C TkiN) as follows: 

C = {1 : 3i e I s.t. \\H(I)ei\\ 2 > 6/Vk} 
Let us show that B C C by proving C c C B c . Indeed, if I S C c , then we have 

= max \\H(I)x\\ 2 = max \\H(I)(xxei + x 2 e 2 + ■ ■ ■ + x k e k )\\ 

MU=1 I as || 2 = 1 

< max V 1^111^(1)^112 

|a»l|2=l ^ 

< max ||x||i max \\H(I)ei\\ 2 

|a;;||2 = l l<l<k 

< Vk max \\H{I)ei\\ 2 . 

l<l<k 

< s, 

which implies I £ B c . Now since B C C, we only need to show that Pn k (C) < e. 

Careful readers may have already noticed that the target quantity Pn h (C) uses a different probability measure from 
that in theorem's assumption. We note that a change of measure is actually inevitable since the probability measure 
in Azuma-Hoeffding's inequality we used in Proposition 14.31 is with respect to ordered fc-tuples while that in the 
definition of StRIP is with respect to unordered ones. In the following, we provide a rigorous calculation that supports 
this measure transformation. 

For any / G C, by definition, there exists at least one I G I such that ||iije;|| > S/Vk. Among such I, let 
be the smallest one = min {lei: ll-ff/ejHa > S/Vk}. Now we define a map from an unordered fc-tuple 
leCC T k {N) to a set of ordered fc-tuples Q(I) = {(h, . . .,i k _ 1 ,i(T)) : .,i k -i) = a(I\i(I)),a e Sfc-i}, 

where Sk-i denotes the set of all permutations of k — 1 elements. Obviously, |Q(/)| = (k — 1)1 for all /, and 
Q(h) n Q{I 2 ) = for distinct fc-subsets h,I 2 - Moreover, if ...,i k ) G Q(I), then \\H(I)e k \\ 2 > 5/Vk or 
£?=i<i» > S 2 /k. Therefore 

k-l 



|J Q(I)C {(fx,..., i fc )c [N] >S 2 /k.} 



iec 1=1 
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Now compute 



Rfe( } -1)1 (*)(*-!)! 

_ lU/eoTOl 



< 



fc-i 



fe-1 



{(i 1 ,...,<*)c[JV]:X;M? I ,i fc >* 2 A} 



= kPx(Y,»li m >8 2 /k). 

m— 1 

By the assumption of the theorem the last expression is at most e which proves our claim. I 

Theorem l4.5l implies the following 
Corollary 4.6. Let <£> be anm x N matrix. If 

n aS 2 , 4 {l-a) 2 5 i 
9< W , and p < 32fc3lQg(2fc/eir 

where < a < 1, f/ien $ is (k, 8, ei)-StRIP. 

Proof. Take e\ = 2ke~^/ a , then (3 = 4- log(2fc/ei). The claim is obtained by substituting this value into the condi- 
tions of Corollary 03] ■ 



Observe that the sufficient condition for the (k, <5)-RIP property from the Gershgorin theorem is \i < 8/k, so the 
result of Corollary I4.6 1 gives a better result, namely \i = 0(fc -3 / 4 ). At the same time, Tropp's result in fl49l Thm. B] 
implies that the matrix $ is (fc, 8, e)-StRIP under a weaker (i.e., more inclusive) condition. Below we improve upon 
these results by analyzing the StRIP property directly rather than relying on the SINC condition. 

Theorem 4.7. Let $ by an m x N matrix and let 9 — p, 2 or 9 — /i^ax' depending on whether $ is coherence-invariant 
or not. Let e < min{l/fc, e 1_1 / log2 } and suppose that $ satisfies 

,4 1 I (1 - a) 2 b 2 ,\ , „ ab 

(20) fcu 4 < — 5 mm —, — — r-TTiC 2 ) and k9 < — -, 

^ -log 2 (l/e) V321og(2fc)log(e/e)' J "log(l/e)' 

where a, b, c G (0, 1) are constants such that 

(21) Va~+V2a~b + Vc'+—\\n 2 <e- 1 / 4 6/6V2. 
Then $ is (fc, c5, e) -St RIP. 

The proof relies on several results from |j49l . The following theorem is a modification of Theorem 25 in that paper. 
Below R denotes a linear operator that performs a restriction to k coordinates chosen according to some rule (e.g., 
randomly). Its domain is determined by the context. Its adjoint R* acts on R fe by padding the fc-vector with the 
appropriate number of zeros. 

Theorem 4.8. (Decoupling of the spectral norm) Let A be a 2N x 2N symmetric matrix with zero diagonal. Let 
Tj G {0, 1} 2N be a random vector with N components equal to one. Define the index sets T\(rj) — {i : T]i = 
0}, 12(77) = {i '■ T)i — 1}. Let Rbe a random restriction to k coordinates. For any q > 1 we have 

(22) (E\\RAR*\n^ < 2 max E^Ep^^^^f) 1 ^, 

fel +&2 — fe 

where Ax 1 fri)xT 2 !ri) denotes the submatrix of A indexed by Ti(rj) x T2(rj) and the matrices Ri are independent 
restrictions to ki coordinates from Tj, i = 1, 2. 

When A has order (2N + 1) x (2N + 1), then an analogous result holds for partitions into blocks of size N and 
N + 1. 
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Inequality d22b is implicitly proved in the proof of the decoupling theorem (Theorem 9) ||49"1 . The ideas behind it 
are due to ll38l . 

The next lemma is due to Tropp BH1 and Rudelson and Vershinin 11441 . 

Lemma 4.9. Suppose that A is a matrix with N columns and let R be a random restriction to k coordinates. Let 
q>2,p = max(2,21og(rkAR*),g/2). Then 



(E\\AR*\n^ < '^{E\\AR*\\U 2 Y^ + yA|| A || 



where 



l->2 



is the maximum column norm. 



The following lemma is a simple application of Markov's inequality, a similar result can be found in 11381 . Lemma 
4.10; see also 



Lemma 4.10. Let q, A > and let ^ q be a positive function of q. Suppose that Z is a positive random variable whose 
qth moment satisfies the bound 

(EZ") 1 /" <^ + X. 

Then 



Proof: By the Markov inequality, 

P(z>e l '\£ q Jq + \j) < 



EZi 



< 



The main part of the proof is contained in the following lemma. 
Lemma 4.11. Let $ be an m x N matrix with coherence parameter fi. Suppose that for some < e±, e 2 < 1 
(23) P K ({(I,i):\\$Jct> i \\ 2 >e 1 }\i)<e 2 . 

Let Rbe a random restriction to k coordinates and H = <I> T <I> — Id. For any q > 2,p = max(2, 2 log(rk RHR*), q/2) 
we have 

(24) 



2k, 



(E\\RHRT) 1/g < GVPiVei + (ke 2 )^pVk + V2k0) + — 1|$| 



Proof. We begin with setting the stage to apply Theorem 14.81 Let r\ E {0, 1}^ be a random vector with N/2 
ones and let Ri,R 2 be random restrictions to fcj coordinates in the sets Ti{rj),i = 1,2, respectively. Denote by 
supp(-Rj), i = 1, 2 the set of indices selected by Ri and let H(rf) := HT 1 ( v )xT 2 (-n)- Let q > 1 and let us bound the 
term E rj (E\\RiH (r])R 2 \\ q ) 1 ^ q that appears on the right side of (122V The expectation in the q-norm is computed for 
two random restrictions R\ and R2 that are conditionally independent given 77. Let be the expectation with respect 
to Ri, i — 1,2. Given r\ we can evaluate these expectations in succession and apply Lemma |4~9l to E2 : 

1/9 



E^EWR.Hi^R* 



E 1 {E 2 \\R 1 H( V )Rt 



<E r) [E 1 [2,^ ) {E 2 \\R l H{f ] )R* 2 \\\_ t2i 



E 1 (E 2 \\R 1 H( V )R*\\U 2 ) 



1/9 4 
1/9 



1/9 



where on the last line we used the Minkowski inequality (recall that the random variables involved are finite). Now 
use Lemma |4~9l again to obtain 



(25) E 7) (E||i? lJ ff(r ? )^||«) 1 /« < 3VpE r JE 1 E 2 ||i? 1 ^(r ? )i?;||«_ 



1/9 



J^E r ,(E 1 \\H(r,YRl\\U2) 1/q 



Akik 2 
N 2 



^v\\H(vY 



Let us examine the three terms on the right-hand side of the last expression. Let rj(R 2 ) be the random vector conditional 
on the choice of k 2 coordinates. The sample space for ri(R 2 ) is formed of all the vectors 77 € {0, 1}^ such that 
supp(i?2) C T 2 (rj). In other words, this is a subset of the sample space {0, 1}^ that is compatible with a given R 2 . 
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The random restriction Ri is still chosen out of T\ (77) independently of R 2 . Denote by R a random restriction to k\ 
indices in the set (supp(i? 2 )) c and let E be the expectation computed with respect to it. We can write 

E„(E 1 E 2 ||iJ 1 lf(r ? ) J R|||^ 2 ) 1 /9 < (E„E 1 E 2 || J R 1 F(r 7 )^||^ 2 ) 1 /9 



(E 2 E\\RH(rj)Rl\\l 



1 U/? 



,2' 



Recall that Hij = fiijl^j} and that R and R2 are 0-1 matrices. Using this in the last equation, we obtain 

(26) E 2 E\\RH(t,)I%\\1_ 3 < E 2 E max (E <6bupp( £) ^ ■ 

Now let us invoke assumption (l23l l. Recalling that k\ < k, we have 

Thus with probability 1 — fc 2 e 2 the sum in d26b is bounded above by e± . For the other instances we use the trivial bound 
kxii 2 . We obtain 

3 v ^E r) E 1 (E 2 || J R lJ ff(7 ? ) J R;||U 2 ) 1 ^ < 3VP((1 - ^2)^ + k^k^f' 2 ) 1 ^ 

<3Vp( £ f + fc 2e2 (fc 1 M 2 ) 9/2 ) 1/9 

<3Vp(V^+(fce 2 ) 1/9 V / ^), 

where in the last step we used the inequality a q + b q < (a + b) q valid for all q > 1 and positive o, b. Let us turn to the 
second term on the right-hand side of (l25l l. Assuming coherence invariance, we observe that 



||^)*i?II|i^ 2 - max \\H hT2{n) \\ 2 < max ||^,|| 2 = ^Np? 
jeTi(»j) je[N] 

where Hj t . denotes the jth row of H and Hj^m is a restriction of the jth row to the indices in T 2 (rj). At the same 
time, if the dictionary is not coherence-invariant, then in the last step we estimate the maximum norm from above by 
\J Np, 2 ^^, so overall the second term is not greater than \/N~9, 
Finally, the third term in (l25l l can be bounded as follows: 



^E,ii« ( ,)ii<;fii±W|| ff |i = i|i^-/„ii 

< — max(l, j|$|| 2 - 1) < — 1|$|| 2 , 

where the last step uses the fact that the columns of $ have unit norm, and so <I> 2 > N/m > 1. 
Combining all the information accumulated up to this point in d25l l. we obtain 

E^EWR.H^R*^) 1 ^ < 3^(Vr 1 + (ke 2 ) 1 / q f iVk+^2k 2 ~9) + A||$|| 



Finally, use this estimate in (122l i to obtain the claim of the lemma. I 
Proof of Theorem \4.7\ 

Proof. The strategy is to fix a triple a,b,c £ (0, 1) that satisfies (f2Tb and to prove that < [20b implies (k, 5, e)-StRIR Let 
ei = io gi/ t and e 2 = fc~ 1+loge . In Corollary I4.4l set a = e\ and /3 = alog(2/e 2 ). Under the assumptions in d20t this 
corollary implies that 

k 

m— 1 

Invoking Lemma |4.1 11 we conclude that d24l i holds with the current values of ei, e 2 . For any q > 41ogfc we have 
p = q/2, and thus (124-b becomes 

(27) (E || || < 3^(7^1 + {ke^pVk + V2k6) + 2^||$|| 2 . 

Introduce the following quantities: 

2k 

= 3V2(V^T + {ke 2 f' q nVk + V^fcfl) and A = -^||$H 2 . 
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Now < f27b matches the assumption of Lemma l4.10l and we obtain 

(28) P Rk (|| RHR* || >e 1 '\£ qy /q + A)) <e^\ 

Choose q = 4 log(l/ e) , which is consistent with our earlier assumptions on k,q, and e. With this, we obtain 

P Rk (\\RHR*\\ >e 1/4 (£ g ^+A)) <e. 

Now observe that \\RHR* \\ < 6 is precisely the RIP property for the support identified by the matrix R. Let us verify 
that the inequality 




is equivalent to ( f2Tb . This is shown by substituting ei and £2 with their definitions, and /1 and 9 with their bounds in 
statement of the theorem. Thus, Pn k (\\RHR* \\ > 6) < e, which establishes the StRIP property of I 



5. Examples and extensions 

5.1. Examples of sampling matrices. It is known ll27l that experimental performance of many known RIP sampling 
matrices in sparse recovery is far better than predicted by the theoretical estimates. Theorems l4. 1 l andl 4.7l provide some 
insight into the reasons for such behavior. As an example, take binary matrices constructed from the Delsarte-Goethals 
codes [39 p. 461]. The parameters of the matrices are as follows: 

(29) m = 2 2s+2 , N = 2-'m r+2 , p, = 2 r m~ 1 / 2 

where s > is any integer, and where for a fixed s, the parameter r can be any number in {0, 1, . . . , s — 1}. If we take 
s to be an odd integer and set r = (s + l)/2, then we obtain, 

m = 2 4r , N = 2 4r2+7r , p = m" 1 / 4 . 
The matrix $ is coherence-invariant, so we put 9 = p 2 . Lemma IB31 below implies that 

9 N - m 1 

(30) P = (AT U < -' 

m{I\ — 1) m 



and the norm of the sampling matrix satisfies ||$|| = y/N/m. Thus for [i and p, 2 to satisfy the assumptions in 
Theorems 14. 1 1 and 14.71 we only need m, N, and k to satisfy the relation m = 9(fclog 3 — ) which is nearly optimal. 
Similar logic leads to derivations of such relations for other matrices. We summarize these arguments in the next 
proposition, which shows that matrices with nearly optimal sketch length support high-probability recovery of sparse 
signals chosen from the generic signal model. 

Proposition 5.1. Let $ be an m x N sampling matrix. Suppose that it has coherence parameter fi = 0(m -1 / 4 ) and 
9 = 0(m _1 ), where 6 = p, or 6 = p 2 ^^ according as $ is coherence-invariant or not, and 



||$|| = O(^Njk). 

If in = 0(fc(log(A r /e)) 3 ), then $ supports sparse recovery under Basis Pursuit for all but an e proportion of k-sparse 
signals chosen from the generic random signal model Sk 

We remark that the conditions on (mean or maximum) square coherence are generally easy to achieve. As seen 
from Table [T]below, they are satisfied by most examples considered in the existing literature, including both random 
and deterministic constructions. The most problematic quantity is the coherence parameter /i. It might either be large 
itself, or have a large theoretical bound. Compared to earlier work, our results rely on a more relaxed condition on 
p,, enabling us to establish near-optimality for new classes of matrices. For readers' convenience, we summarize in 
Table 1 a list of such optimal matrices along with several of their useful properties. A systematic description of all but 
the last two classes of matrices can be found in J4]. Therefore we limit ourselves to giving definitions and performing 
some not immediately obvious calculations of the newly defined parameter, the mean square coherence. 

Normalized Gaussian Frames. A normalized Gaussian frame is obtained by normalizing each column of a Gaussian 
matrix with independent, Gaussian-distributed entries that have zero mean and unit variance. The mutual coherence 
and spectral norm of such matrices were characterized in [j4] (see Table [TJ. These results together with the relation 
A*max < A^ 2 l ea d to a trivial upper bound on p^^, namely /I„ lax < 151og7V/m. Since this bound is already tight 
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enough for /^ 2 nax to satisfy the assumption of Proposition l5.ll and to avoid distraction from the main goals of the paper, 
we made no attempt to refine it here. 

Random Harmonic Frames: Let JbeanAfxJV discrete Fourier transform matrix, i.e., T^k = -L= e 27T ijk/N ^ j^ e t 
rji, i = 1, N, be a sequence of independent Bernoulli random variables with mean ^. Set M. = {i : i]i = 1} and 

use Tm to denote the submatrix of T whose row indices lies in M.. Then the random matrix J Tm is called a 
random harmonic frame I12T1IT81 . In the next proposition we compute the mean square coherence for all realizations 
of this matrix. 

Proposition 5.2. All instances of the random harmonic frames are coherence invariant with the following mean square 
coherence 

- 2 N-\M\ 



I' 



(N-1)\M[ 



Proof: For each t E [\M.\], let a t with be the t-th member of AL To prove coherence invariance, we only need to 
show that {f-Lj.k '■ k E [N]\j} = {/inm '■ k £ [N — 1]} holds for all j G [N]. This is true since 

, \M\ 

Mi.fc = jj^ 2_j e N = MiV,(fc-i+JV)mod n for all k ^ j. 

In words, the kth coherence in the set {fij,k,k £ [N]\j} is exactly the (k — j + N mod 7V)-th coherence in 
{HN,ki k £ [N — 1]}, therefore the two sets are equal. We proceed to calculate the mean square coherence, 

2 



1 N 

E 



N(N 


- l)\M\ 2 




1 


N(N 


- 1)|M| 2 




1 


N(N 


-1)\M\ 2 




1 


N(N 


-1)|M| 2 


N — 


\M\ 


(N- 


1)\M\- 



\M\ 

e 2ni(J-k)a t /N 



N \M\ 

j^k,j,k=l ti,t 2 =l 

N \M\ \M\ N 

E E 1 + E EE e27r4 ' 

ij^k,j,k=l ti=*2 = l t 1 ^t 2 ,ti,t 2 = l k=l j^k 

(N(N - 1)\M\ - \M\(\M\ - 1)N) 



Chirp Matrices: Let m be a prime. An m x m 2 "chirp matrix" <£> is defined by $t, a m+fc = ~^ e27Tl<,bt +at ^ m for 
t,a,b = 1, m. The coherence between each pairs of column vectors is known to be 

fJ>jk = A= (j + k), 



from which we immediately obtain the inequalities /i < 1/y/m and p 2 < 1/ra. More details on these frames are 
given, e.g., in 1911221. 

Equiangular tight frames (ETFs): A matrix $ is called an ETF if its columns {cfn £ M. m , i = 1, N} satisfy the 
following two conditions: 

• \\<j>i\\ 2 = l.fori = 1,...,N. 

• My = ^/4^=TT' for * ^ * 

From this definition we obtain = ^/ m j ^" 1 j an d = p, 2 = „^"\) ■ The entry in the table also covers the recent 
construction of ETFs from Steiner systems ll28l . 

Reed-Muller matrices: In Table Q] we list two tight frames obtained from binary codes. The Reed-Muller matrices 
are obtained from certain special subcodes of the second-order Reed-Muller codes 0391 ; their coherence parameter fi 
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Name 


MJC 


Coherence-Invariant 


Dimensions 


Normalized Gaussian (G) 


R 


No 


m X JV 


Random harmonic (RH) 


C 


Yes 


\M\ x JV, \m < \M\ < |m 


Chirp (C) 


C 


Yes 


m X m 2 


ETF (including Steiner) 


c 


Yes 


y/N <m< JV 


Reed-Muller (RM) 


R 


Yes 


2 s x 2 t ( 1+s > 


Delsarte-Goethals set (DG) 


R 


Yes 


2 2s + 2 x 2 2(s + l)(r + 2)-r 


Deterministic subFourier (SF) 


C 


Yes 


m X p 



< y-l.">l.-, s A 

— -Jrn- TOTog JV 



mJV — |A1|(JV-1) 

1 1 

y^Tl m + 1 



i(JV-l) 
1 

V2=-2t-l 
2^— s— l 



e 3d m -l/(9d 2 logd) 



< 2~ a 



< 



1 



Name 



11*11 



Restrictions 



Probability 


Sketch dimensions: m = H(-) 


> 1- ±± 
— JV 


fc log 2 JV 


> 1 - A _ 1 

- 1 JV JV^ 


fclog 2 JV 


deterministic 


k log m 


deterministic 


k log m 


deterministic 


fc log 3 JV 


deterministic 


k log 3 JV 


deterministic 


9d 2 log d 
(fclog 2 JV) 4 



RH 

C 

ETF 
RM 
DG 

SF 



2JV 



' JV 
m 
2*s/2 

2 (s+l)(r+l)-r/2 



' A/(JV-1) 
N-M 



601ogJV<m< 
16 log JV < m < 4- 



m is pnme 

' (JV-m)(JV-l) 



/p/m 



are odd integers 

* < s/4 
r < s/2 

p is prime, p 1 /( d_1 ) < rn < p 



TABLE 1 . Examples for Prop. 15. ll Classes of sampling matrices satisfying the incoherence conditions 



is found in j4j and the mean square coherence is found from d30b . The Delsarte-Goethals matrices are also based on 
some subcodes of the second order Reed-Muller codes and were discussed earlier in this section. Both dictionaries 
support orthogonal arrays, and therefore, form unit-norm tight frames (rows of the matrix $ are pairwise orthogonal), 
with a consequence that j|$|| = \jN/m. We include these two examples out of many other possibilities based on 
codes because they appear in earlier works, and because their parameters are in the range that fits well our conditions. 

We note that the quaternary version of these frames is also of interest in the context of sparse recovery; see in 
particular Ifl2l . 

Deterministic Fourier Construction f3p : Let p > 2 be a prime, and let f(x) £ ¥ p [x] be a polynomial of degree 
d > 2 over the finite field ¥ p . Suppose that m is some integer satisfying p 1 / 1''" 1 ) < m < p. Then we can construct an 
m x p deterministic RIP matrix from apxp DFT matrix by keeping only the rows with indices in {/(n) (mod p),n = 
1, . . . , to}, and normalizing the columns of the resulting matrix. It is known that this matrix has mutual coherence no 
greater than e 3d m^ 1 ^ 9d lo s d ). Even though this bound is an artifact of the proof technique used in fl3Tl , there seem 
to be no obvious ways of improving it. 

5.2. StRIP matrices from orthogonal arrays. Let us briefly consider another way of constructing StRIP matrices 
based on elementary arguments. Let C = {<fii, . . . , 4>n} be a collection of binary m-vectors. We assume that the 
entries of the vectors are of the form ± 1 / y/rri and denote the correlation of 4>i an d fij by M« = K&ify}]- 

The set C is called an orthogonal array of strength t if every subset of r < t coordinates of the vectors of C supports 
a uniformly random binary r-vector. A good reference for orthogonal arrays is the book by Hedayat et al. ll32l . An 
orthogonal array has the property that any t coordinates of a randomly chosen vector behave as independent random 
variables (therefore, of course, t is much smaller than to). In particular, the first t moments of the distance distribution 
of C are equal to the moments of the binomial distribution. Let dij = ^(1 — (t>J(f>j) be the Hamming distance between 
4>i and <j)j . 

Lemma 5.3. (Pless identities, e.g. ||39l p. 1 321) Let C be an orthogonal array of strength t. Let B w = (l/N)\{(<pi,(/>j) £ 
C | dij = w} \ be the number of pairs vectors in C at distance w. For all I = 1,2, ... ,t 

-if (:)(-?)'■ 

tu=0 w=0 v 7 
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We will need a manageable estimate of the right-hand side of (|3TT i. We quote from [39 , p. 288]: let I > 2 be even, 
then 

^£(:)(»-i)^(S ,/2 ^« 

w=0 v 7 

The main result of this section is given by the following theorem. 

Theorem 5.4. Let C be an orthogonal array of strength t and cardinality N and let I < t be even. If m > 
(3/4) l{k/8) 2 {k/e) 2 ' 1 then $ is (k, S, e)-StRIP. 

Proof. Let / C [AT] be a uniformly random fc-subset. We clearly have 

\min{<f>J$l)\\x\\l < W^JXWI < \ max ($T$l)\\x\\l 

where A m i n (-) and A max (-) are the minimum and maximum eigenvalues of the argument. 
By the Gershgorin theorem, any eigenvalue A of the Gram matrix $J$ satisfies 

|A-1| < X>y, 

jeii 

for some i 6 [N] , where we used the notation := Now consider the probability that for some i € I the sum 

J2jei AHj > The proof will be finished if we show that this probability is less than e. Let / = {ii, . . . , We 
have 

Pr„ ( 3i ^ I: J2 ViJ >S)< kP Rk ( Mil ,j > S) < k^E Rk ( IH 1% S 



■j 



(k-1) 1 /i — W 



E ^ 



fc(fc-l)'- 1 v , 
- Jl b «* ViuP 

where the last step uses convexity of the function z h> z'. The trick is to show that the expectation on the last line, 
presently computed over the choice of /, can be also found with respect to a pair of random uniform elements of C 
chosen without replacement. This is established in the next calculation: 

je/ij ii<i 2 <---<ifc Vk/ j=2 'Vfc/ i 1 ^i 2 ^---^i k j=2 

k N 



7VC AT - 1) E E E ^i.^ 



(33) =(*-l)E^ i , 

where the expectation on the last line (and below in the proof) is computed with respect to a pair of uniformly chosen 
distinct random vectors from C. Next using Oil and switching to the variable w = (m/2)(l — /i), we obtain 

2\C R„ / m\* 



E <4 = £ E 



— 



mJ ^ N- 1 V 2 

10 — 1 

2_y jv ryggL^ m V V mN ' 

m/iV-lL^iVV 2/ 7VV2 

tu=0 

_ / 2 \ ( TV f 1 /"A / m\' 1 



-f-y ru»--V- J rf-V 

1 L2 m ^ U/V 2 / JV V 2 / 



uj=0 



Now we can use d32t and I < into write 

E/4 < (-)' /2 #T^- AT-T * e 1/6 ^ (i+1)/2 (e™)- i/2 
J \em/ N — 1 A* — 1 
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Conclude using the condition on m : 

P Rk (3i € I : X; fiij >5)< fc l + 1 <y- , e 1 / 6 /( ,+1 )/ 2 (em)- , / a < e. 

I 

Observe that the condition of this theorem is nonasymptotic, and is satisfied by a number of known constructions 
of orthogonal arrays. 

Example: Consider sampling matrices obtained from the binary Delsarte-Goethals codes already mentioned above; 
see Ea. (1291 . It is known that the underlying code forms an orthogonal array of strength t = 7, so taking I = 6 we 
obtain a family of (fc, 5, e)-StRIP matrices of dimensions to x N for sparsity 

k < 0.52 (^em 3 ) 1 / 7 = 0.52(<5 6 e) 1/7 \T •jV) 3 /< 7 < r+2 » . 

77ze case r = was considered in 1131 where these matrices were analyzed based on the detailed properties of this 
particular case of the construction. Our computation, while somewhat crude, permits a uniform estimate for the entire 
family of matrices. The estimate can be improved if the expectation Ep\j can be computed explicitly from the known 
distribution of correlations. For instance, taking r = 1 and using the distribution given in 11391 p.477] we obtain that 
E/x 6 m (4/3)to -3 . With this, the condition on sparsity that emerges has the form k < 0.95(<5 6 em 3 ) 1 / 7 , with a better 
constant compared to the general estimate. For instance, we obtain to x (m 3 /2) matrices with the (k, 5, 0.001) StRIP 
property for all k < 0.35(5 6 ' 7 to 3 ' 7 . 

Another similar possibility arises if C is taken to be a binary dual BCH code with to = 2 s — 1, N = m r , /i = 
2{r-l)m- 1 / 2 ,r = 1,2,3,.... Many more such constructions can be obtained from other algebraic codes such as the 
Kerdock codes, Gold codes, etc. 11331 . This lends further support to earlier studies of sampling matrices constructed 
from the BCH codes (T), Delsarte-Goethals codes, and other binary codes related to the second-order Reed-Muller 

codes nana. 

It would be desirable to show that orthogonal arrays also suffice for the SINC property; however, the technique 
introduced above results in parameters that contradict the Rao bound on the number of rows in an array 11321 . Thus, 
we are unable to show that this construction results in matrices that are good for linear estimators. 

5.3. Further constructions from binary codes. We remark that it is easy to show existence of matrices with low 
coherence. The following observation is a rephrasing of the result known in coding theory as the Gilbert- Varshamov 
existence bound for binary linear codes. 

Proposition 5.5. Let I = log 2 N,l < m and let G = (g 11 . . . , g t ) be an m x I binary matrix whose rows are chosen 
independently and uniformly from F 2 . Let to = 41og N / fi 2 , where < p, < 1. Form the matrix <£> by constructing an 
^2-linear span of the columns ofG and using the map {0, 1} — > {^= f "^^l- Then $ has coherence /j, with probability 
at least 1 — 2/N and mean square coherence p 2 < 1/m with probability at least (1 — (m/N)) m . 

Proof. Note that the Hamming distance d between any two columns of a matrix with coherence /i satisfies fi > 
1 1 — 2d/m\. The set of columns of C forms a linear space, so it suffices to argue about Hamming weights rather than 
pairwise correlations. Let u £ {0, l} 1 be a nonzero vector, then the probability that the vector v = Gu has weight w 
equals (™) 2" r ™. Let X be the random number of columns with weight \w — m/2| > m/i/2. We have 

(34) EX < 2^1 {) ^ N2 1 ~ m ^ h ^-^ 

w=0 ^ ' 

where h(x) — — xlog 2 x — (1 — x) log 2 (l — x) is the binary entropy function. Using the inequality 

1 - - x) > 2x 2 / lo g 2 : 0<x<l/2 

and the condition for /i, we obtain EX < 2/N. Since P(X > 0) < EX, this implies the first claim. The second part 
follows because there are []™j(JV — i) matrices G with distinct nonzero rows. I 

The derandomizing of Gilbert- Varshamov codes was recently addressed by Porat and Rothschild H431 . They pre- 
sented a 0(mN) deterministic algorithm that constructs codes with large minimum distance. To construct incoherent 
dictionaries, we need a bit more, namely that all the pairwise distances are in a narrow segment around to/2. The 
algorithm in l43l can be easily tailored to do this. A simplified version of this procedure which results in the algorithm 
of complexity 0(mN 2 ) (i.e., not as good as in l43l ). was given in ll40l . In a nutshell it is as follows. Instead of 
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constructing the m x N matrix, N = 2 , we aim at constructing a basis of the space of columns, i.e., anmxl matrix 
G. The rows of G are selected recursively. Before any rows are selected, the expected number of codewords of weight 
far from m/2 is given by d34b . The algorithm selects rows one by one so that the expectation of the number of outlying 
vectors conditional on the rows already chosen is the smallest possible. 

We note that in the context of sparse recovery, the dependence between N and m is likely to be polynomial. In this 
range of parameters the above complexity is acceptable and is in fact comparable with the size of the matrix $ which 
needs to be stored for sampling and processing. 
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